Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 5490 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 386.1 KiB |
| Average record size in memory | 72.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 1 |
Mực nước KG is highly overall correlated with Mực nước LT and 3 other fields | High correlation |
Mực nước LT is highly overall correlated with Month and 2 other fields | High correlation |
Mực nước DH is highly overall correlated with Mực nước KG and 1 other fields | High correlation |
Lượng mưa KG is highly overall correlated with Mực nước KG and 2 other fields | High correlation |
Lượng mưa LT is highly overall correlated with Mực nước KG and 2 other fields | High correlation |
Lượng mưa DH is highly overall correlated with Lượng mưa KG and 1 other fields | High correlation |
Month is highly overall correlated with Mực nước LT | High correlation |
Mực nước DH has 67 (1.2%) zeros | Zeros |
Lượng mưa KG has 2489 (45.3%) zeros | Zeros |
Lượng mưa LT has 2548 (46.4%) zeros | Zeros |
Lượng mưa DH has 2493 (45.4%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-03 15:18:51.330426 |
|---|---|
| Analysis finished | 2022-12-03 15:19:03.213519 |
| Duration | 11.88 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
Year
Real number (ℝ)
| Distinct | 45 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1998 |
| Minimum | 1976 |
|---|---|
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | 1976 |
|---|---|
| 5-th percentile | 1978 |
| Q1 | 1987 |
| median | 1998 |
| Q3 | 2009 |
| 95-th percentile | 2018 |
| Maximum | 2020 |
| Range | 44 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 12.988356 |
|---|---|
| Coefficient of variation (CV) | 0.0065006787 |
| Kurtosis | -1.2011868 |
| Mean | 1998 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0 |
| Sum | 10969020 |
| Variance | 168.69739 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=45)
| Value | Count | Frequency (%) |
| 1976 | 122 | 2.2% |
| 1999 | 122 | 2.2% |
| 2001 | 122 | 2.2% |
| 2002 | 122 | 2.2% |
| 2003 | 122 | 2.2% |
| 2004 | 122 | 2.2% |
| 2005 | 122 | 2.2% |
| 2006 | 122 | 2.2% |
| 2007 | 122 | 2.2% |
| 2008 | 122 | 2.2% |
| Other values (35) | 4270 |
| Value | Count | Frequency (%) |
| 1976 | 122 | |
| 1977 | 122 | |
| 1978 | 122 | |
| 1979 | 122 | |
| 1980 | 122 | |
| 1981 | 122 | |
| 1982 | 122 | |
| 1983 | 122 | |
| 1984 | 122 | |
| 1985 | 122 |
| Value | Count | Frequency (%) |
| 2020 | 122 | |
| 2019 | 122 | |
| 2018 | 122 | |
| 2017 | 122 | |
| 2016 | 122 | |
| 2015 | 122 | |
| 2014 | 122 | |
| 2013 | 122 | |
| 2012 | 122 | |
| 2011 | 122 |
Month
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 43.0 KiB |
| 10 | |
|---|---|
| 12 | |
| 9 | |
| 11 |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 1.7540984 |
| Min length | 1 |
Characters and Unicode
| Total characters | 9630 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 9 |
|---|---|
| 2nd row | 9 |
| 3rd row | 9 |
| 4th row | 9 |
| 5th row | 9 |
Common Values
| Value | Count | Frequency (%) |
| 10 | 1395 | |
| 12 | 1395 | |
| 9 | 1350 | |
| 11 | 1350 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 10 | 1395 | |
| 12 | 1395 | |
| 9 | 1350 | |
| 11 | 1350 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 5490 | |
| 0 | 1395 | 14.5% |
| 2 | 1395 | 14.5% |
| 9 | 1350 | 14.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9630 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 5490 | |
| 0 | 1395 | 14.5% |
| 2 | 1395 | 14.5% |
| 9 | 1350 | 14.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 9630 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 5490 | |
| 0 | 1395 | 14.5% |
| 2 | 1395 | 14.5% |
| 9 | 1350 | 14.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 9630 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 5490 | |
| 0 | 1395 | 14.5% |
| 2 | 1395 | 14.5% |
| 9 | 1350 | 14.0% |
Day
Real number (ℝ)
| Distinct | 31 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.754098 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 16 |
| Q3 | 23 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.8077587 |
|---|---|
| Coefficient of variation (CV) | 0.5590773 |
| Kurtosis | -1.1987167 |
| Mean | 15.754098 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.0027898929 |
| Sum | 86490 |
| Variance | 77.576614 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=31)
| Value | Count | Frequency (%) |
| 1 | 180 | 3.3% |
| 17 | 180 | 3.3% |
| 30 | 180 | 3.3% |
| 29 | 180 | 3.3% |
| 28 | 180 | 3.3% |
| 27 | 180 | 3.3% |
| 26 | 180 | 3.3% |
| 25 | 180 | 3.3% |
| 24 | 180 | 3.3% |
| 23 | 180 | 3.3% |
| Other values (21) | 3690 |
| Value | Count | Frequency (%) |
| 1 | 180 | |
| 2 | 180 | |
| 3 | 180 | |
| 4 | 180 | |
| 5 | 180 | |
| 6 | 180 | |
| 7 | 180 | |
| 8 | 180 | |
| 9 | 180 | |
| 10 | 180 |
| Value | Count | Frequency (%) |
| 31 | 90 | |
| 30 | 180 | |
| 29 | 180 | |
| 28 | 180 | |
| 27 | 180 | |
| 26 | 180 | |
| 25 | 180 | |
| 24 | 180 | |
| 23 | 180 | |
| 22 | 180 |
Mực nước KG
Real number (ℝ)
| Distinct | 428 |
|---|---|
| Distinct (%) | 7.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.4690546 |
| Minimum | 5.49 |
|---|---|
| Maximum | 12.22 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | 5.49 |
|---|---|
| 5-th percentile | 5.71 |
| Q1 | 5.98 |
| median | 6.24 |
| Q3 | 6.69 |
| 95-th percentile | 8.08 |
| Maximum | 12.22 |
| Range | 6.73 |
| Interquartile range (IQR) | 0.71 |
Descriptive statistics
| Standard deviation | 0.78896197 |
|---|---|
| Coefficient of variation (CV) | 0.12195939 |
| Kurtosis | 7.4953786 |
| Mean | 6.4690546 |
| Median Absolute Deviation (MAD) | 0.32 |
| Skewness | 2.3271686 |
| Sum | 35515.11 |
| Variance | 0.62246099 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.13 | 70 | 1.3% |
| 6.09 | 63 | 1.1% |
| 6.07 | 63 | 1.1% |
| 6.18 | 62 | 1.1% |
| 6.15 | 61 | 1.1% |
| 6.01 | 60 | 1.1% |
| 5.85 | 60 | 1.1% |
| 6.19 | 59 | 1.1% |
| 5.95 | 59 | 1.1% |
| 6.05 | 58 | 1.1% |
| Other values (418) | 4875 |
| Value | Count | Frequency (%) |
| 5.49 | 2 | < 0.1% |
| 5.5 | 3 | |
| 5.51 | 5 | |
| 5.52 | 1 | < 0.1% |
| 5.53 | 3 | |
| 5.54 | 1 | < 0.1% |
| 5.55 | 3 | |
| 5.56 | 1 | < 0.1% |
| 5.57 | 3 | |
| 5.58 | 3 |
| Value | Count | Frequency (%) |
| 12.22 | 1 | |
| 11.99 | 1 | |
| 11.79 | 1 | |
| 11.67 | 1 | |
| 11.57 | 1 | |
| 11.4 | 1 | |
| 11.3 | 1 | |
| 11.15 | 1 | |
| 11.12 | 1 | |
| 11.04 | 1 |
Mực nước LT
Real number (ℝ)
| Distinct | 336 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.58364324 |
| Minimum | -0.5 |
|---|---|
| Maximum | 4.62 |
| Zeros | 28 |
| Zeros (%) | 0.5% |
| Negative | 646 |
| Negative (%) | 11.8% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | -0.5 |
|---|---|
| 5-th percentile | -0.17 |
| Q1 | 0.22 |
| median | 0.48 |
| Q3 | 0.82 |
| 95-th percentile | 1.72 |
| Maximum | 4.62 |
| Range | 5.12 |
| Interquartile range (IQR) | 0.6 |
Descriptive statistics
| Standard deviation | 0.58706116 |
|---|---|
| Coefficient of variation (CV) | 1.0058562 |
| Kurtosis | 3.6099695 |
| Mean | 0.58364324 |
| Median Absolute Deviation (MAD) | 0.29 |
| Skewness | 1.4338967 |
| Sum | 3204.2014 |
| Variance | 0.3446408 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.3 | 76 | 1.4% |
| 0.34 | 74 | 1.3% |
| 0.38 | 72 | 1.3% |
| 0.42 | 70 | 1.3% |
| 0.28 | 70 | 1.3% |
| 0.46 | 69 | 1.3% |
| 0.6 | 67 | 1.2% |
| 0.5 | 67 | 1.2% |
| 0.4 | 67 | 1.2% |
| 0.44 | 64 | 1.2% |
| Other values (326) | 4794 |
| Value | Count | Frequency (%) |
| -0.5 | 49 | |
| -0.48 | 2 | < 0.1% |
| -0.46 | 1 | < 0.1% |
| -0.45 | 4 | 0.1% |
| -0.44 | 5 | 0.1% |
| -0.43 | 1 | < 0.1% |
| -0.42 | 2 | < 0.1% |
| -0.4 | 1 | < 0.1% |
| -0.39 | 3 | 0.1% |
| -0.38 | 3 | 0.1% |
| Value | Count | Frequency (%) |
| 4.62 | 1 | |
| 4.2 | 1 | |
| 3.95 | 1 | |
| 3.74 | 1 | |
| 3.72 | 1 | |
| 3.61 | 2 | |
| 3.6 | 1 | |
| 3.53 | 1 | |
| 3.52 | 1 | |
| 3.39 | 1 |
| Distinct | 167 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.28808743 |
| Minimum | -0.24 |
|---|---|
| Maximum | 1.99 |
| Zeros | 67 |
| Zeros (%) | 1.2% |
| Negative | 386 |
| Negative (%) | 7.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | -0.24 |
|---|---|
| 5-th percentile | -0.03 |
| Q1 | 0.13 |
| median | 0.26 |
| Q3 | 0.4 |
| 95-th percentile | 0.7 |
| Maximum | 1.99 |
| Range | 2.23 |
| Interquartile range (IQR) | 0.27 |
Descriptive statistics
| Standard deviation | 0.24131484 |
|---|---|
| Coefficient of variation (CV) | 0.83764445 |
| Kurtosis | 4.563513 |
| Mean | 0.28808743 |
| Median Absolute Deviation (MAD) | 0.13 |
| Skewness | 1.4425409 |
| Sum | 1581.6 |
| Variance | 0.058232851 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.16 | 133 | 2.4% |
| 0.18 | 133 | 2.4% |
| 0.3 | 130 | 2.4% |
| 0.2 | 122 | 2.2% |
| 0.21 | 119 | 2.2% |
| 0.24 | 117 | 2.1% |
| 0.22 | 114 | 2.1% |
| 0.25 | 114 | 2.1% |
| 0.14 | 112 | 2.0% |
| 0.28 | 110 | 2.0% |
| Other values (157) | 4286 |
| Value | Count | Frequency (%) |
| -0.24 | 1 | < 0.1% |
| -0.22 | 2 | < 0.1% |
| -0.21 | 2 | < 0.1% |
| -0.2 | 2 | < 0.1% |
| -0.19 | 6 | |
| -0.18 | 1 | < 0.1% |
| -0.17 | 7 | |
| -0.16 | 8 | |
| -0.15 | 6 | |
| -0.14 | 7 |
| Value | Count | Frequency (%) |
| 1.99 | 1 | < 0.1% |
| 1.96 | 1 | < 0.1% |
| 1.76 | 1 | < 0.1% |
| 1.75 | 1 | < 0.1% |
| 1.73 | 1 | < 0.1% |
| 1.68 | 2 | |
| 1.66 | 1 | < 0.1% |
| 1.63 | 1 | < 0.1% |
| 1.58 | 2 | |
| 1.54 | 3 |
| Distinct | 785 |
|---|---|
| Distinct (%) | 14.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.912805 |
| Minimum | 0 |
|---|---|
| Maximum | 500 |
| Zeros | 2489 |
| Zeros (%) | 45.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.6 |
| Q3 | 10.775 |
| 95-th percentile | 74.82 |
| Maximum | 500 |
| Range | 500 |
| Interquartile range (IQR) | 10.775 |
Descriptive statistics
| Standard deviation | 35.095403 |
|---|---|
| Coefficient of variation (CV) | 2.5225253 |
| Kurtosis | 31.040805 |
| Mean | 13.912805 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | 4.783516 |
| Sum | 76381.3 |
| Variance | 1231.6873 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2489 | |
| 0.2 | 89 | 1.6% |
| 1 | 56 | 1.0% |
| 0.4 | 52 | 0.9% |
| 1.2 | 47 | 0.9% |
| 0.3 | 46 | 0.8% |
| 2 | 43 | 0.8% |
| 0.8 | 41 | 0.7% |
| 4 | 39 | 0.7% |
| 0.1 | 36 | 0.7% |
| Other values (775) | 2552 |
| Value | Count | Frequency (%) |
| 0 | 2489 | |
| 0.1 | 36 | 0.7% |
| 0.2 | 89 | 1.6% |
| 0.3 | 46 | 0.8% |
| 0.4 | 52 | 0.9% |
| 0.5 | 31 | 0.6% |
| 0.6 | 33 | 0.6% |
| 0.7 | 34 | 0.6% |
| 0.8 | 41 | 0.7% |
| 0.9 | 25 | 0.5% |
| Value | Count | Frequency (%) |
| 500 | 1 | |
| 396.8 | 1 | |
| 378.2 | 1 | |
| 320.9 | 1 | |
| 315.9 | 1 | |
| 314.9 | 1 | |
| 305.9 | 1 | |
| 300.7 | 1 | |
| 299.6 | 1 | |
| 297.4 | 1 |
| Distinct | 773 |
|---|---|
| Distinct (%) | 14.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.909126 |
| Minimum | 0 |
|---|---|
| Maximum | 686.6 |
| Zeros | 2548 |
| Zeros (%) | 46.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.4 |
| Q3 | 10.375 |
| 95-th percentile | 73.455 |
| Maximum | 686.6 |
| Range | 686.6 |
| Interquartile range (IQR) | 10.375 |
Descriptive statistics
| Standard deviation | 36.717592 |
|---|---|
| Coefficient of variation (CV) | 2.6398203 |
| Kurtosis | 48.796685 |
| Mean | 13.909126 |
| Median Absolute Deviation (MAD) | 0.4 |
| Skewness | 5.6253182 |
| Sum | 76361.1 |
| Variance | 1348.1816 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2548 | |
| 0.3 | 63 | 1.1% |
| 0.2 | 62 | 1.1% |
| 0.5 | 59 | 1.1% |
| 1 | 48 | 0.9% |
| 0.1 | 46 | 0.8% |
| 0.6 | 44 | 0.8% |
| 1.2 | 42 | 0.8% |
| 2 | 41 | 0.7% |
| 1.5 | 39 | 0.7% |
| Other values (763) | 2498 |
| Value | Count | Frequency (%) |
| 0 | 2548 | |
| 0.1 | 46 | 0.8% |
| 0.2 | 62 | 1.1% |
| 0.3 | 63 | 1.1% |
| 0.4 | 36 | 0.7% |
| 0.5 | 59 | 1.1% |
| 0.6 | 44 | 0.8% |
| 0.7 | 35 | 0.6% |
| 0.8 | 27 | 0.5% |
| 0.9 | 14 | 0.3% |
| Value | Count | Frequency (%) |
| 686.6 | 1 | |
| 437.8 | 1 | |
| 405.5 | 1 | |
| 397.6 | 1 | |
| 379.7 | 1 | |
| 371.3 | 1 | |
| 361.7 | 1 | |
| 338 | 1 | |
| 330.3 | 1 | |
| 318.6 | 1 |
| Distinct | 732 |
|---|---|
| Distinct (%) | 13.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.15518 |
| Minimum | 0 |
|---|---|
| Maximum | 746.9 |
| Zeros | 2493 |
| Zeros (%) | 45.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.2 |
| Q3 | 7.6 |
| 95-th percentile | 63.155 |
| Maximum | 746.9 |
| Range | 746.9 |
| Interquartile range (IQR) | 7.6 |
Descriptive statistics
| Standard deviation | 35.282837 |
|---|---|
| Coefficient of variation (CV) | 2.9026996 |
| Kurtosis | 70.716706 |
| Mean | 12.15518 |
| Median Absolute Deviation (MAD) | 0.2 |
| Skewness | 6.5966994 |
| Sum | 66731.94 |
| Variance | 1244.8786 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2493 | |
| 0.1 | 154 | 2.8% |
| 0.2 | 118 | 2.1% |
| 0.3 | 77 | 1.4% |
| 0.4 | 71 | 1.3% |
| 0.5 | 63 | 1.1% |
| 0.7 | 47 | 0.9% |
| 1 | 41 | 0.7% |
| 0.6 | 40 | 0.7% |
| 0.8 | 37 | 0.7% |
| Other values (722) | 2349 |
| Value | Count | Frequency (%) |
| 0 | 2493 | |
| 0.1 | 154 | 2.8% |
| 0.2 | 118 | 2.1% |
| 0.3 | 77 | 1.4% |
| 0.32 | 1 | < 0.1% |
| 0.4 | 71 | 1.3% |
| 0.5 | 63 | 1.1% |
| 0.6 | 40 | 0.7% |
| 0.7 | 47 | 0.9% |
| 0.8 | 37 | 0.7% |
| Value | Count | Frequency (%) |
| 746.9 | 1 | |
| 554.6 | 1 | |
| 414.6 | 1 | |
| 342.5 | 1 | |
| 341.9 | 1 | |
| 338.2 | 1 | |
| 330.5 | 1 | |
| 329 | 1 | |
| 320.4 | 1 | |
| 315.7 | 1 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| Year | Month | Day | Mực nước KG | Mực nước LT | Mực nước DH | Lượng mưa KG | Lượng mưa LT | Lượng mưa DH | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1976 | 9 | 1 | 5.79 | -0.10 | 0.02 | 0.0 | 0.0 | 0.0 |
| 1 | 1976 | 9 | 2 | 5.75 | -0.11 | 0.00 | 0.0 | 0.0 | 0.0 |
| 2 | 1976 | 9 | 3 | 5.73 | -0.12 | 0.00 | 0.0 | 0.0 | 0.0 |
| 3 | 1976 | 9 | 4 | 5.74 | -0.14 | -0.02 | 0.0 | 0.0 | 0.0 |
| 4 | 1976 | 9 | 5 | 5.74 | -0.14 | -0.01 | 0.0 | 12.0 | 0.0 |
| 5 | 1976 | 9 | 6 | 5.75 | -0.14 | -0.01 | 22.4 | 0.0 | 0.1 |
| 6 | 1976 | 9 | 7 | 5.76 | -0.12 | 0.01 | 0.0 | 0.0 | 0.0 |
| 7 | 1976 | 9 | 8 | 5.73 | -0.13 | 0.00 | 0.0 | 0.0 | 0.0 |
| 8 | 1976 | 9 | 9 | 5.71 | -0.14 | 0.02 | 0.0 | 0.0 | 0.0 |
| 9 | 1976 | 9 | 10 | 5.68 | -0.15 | 0.06 | 0.0 | 0.0 | 0.0 |
| Year | Month | Day | Mực nước KG | Mực nước LT | Mực nước DH | Lượng mưa KG | Lượng mưa LT | Lượng mưa DH | |
|---|---|---|---|---|---|---|---|---|---|
| 5480 | 2020 | 12 | 22 | 6.51 | 0.59 | 0.40 | 0.0 | 0.0 | 0.0 |
| 5481 | 2020 | 12 | 23 | 6.46 | 0.54 | 0.32 | 0.0 | 0.0 | 0.0 |
| 5482 | 2020 | 12 | 24 | 6.42 | 0.47 | 0.20 | 1.6 | 0.4 | 0.2 |
| 5483 | 2020 | 12 | 25 | 6.39 | 0.43 | 0.11 | 0.2 | 0.0 | 0.0 |
| 5484 | 2020 | 12 | 26 | 6.38 | 0.37 | 0.08 | 2.4 | 0.8 | 0.0 |
| 5485 | 2020 | 12 | 27 | 6.36 | 0.32 | 0.13 | 0.0 | 0.0 | 0.0 |
| 5486 | 2020 | 12 | 28 | 6.33 | 0.27 | 0.03 | 0.4 | 0.0 | 0.1 |
| 5487 | 2020 | 12 | 29 | 6.30 | 0.21 | 0.06 | 0.0 | 0.0 | 0.0 |
| 5488 | 2020 | 12 | 30 | 6.28 | 0.14 | 0.27 | 7.6 | 0.4 | 2.4 |
| 5489 | 2020 | 12 | 31 | 6.34 | 0.17 | 0.40 | 4.8 | 0.0 | 0.1 |